image.png

Table of content

1. Introduction
2. Downloading and Installing Prerequisites
3. Importing python libraries
4. Loading Datasets
5. Finding country wise information on COVID-19
6. Global cases reported till Date
7. Country wise reported COVID-19 cases
8. Top 10 countries with COVID-19 cases
9. Correlation Analysis of COVID-19 cases
10. Visualization of COVID-19 global cases using Folium map
11. Visualization of COVID-19 in India
12. Correlation Analysis of COVID-19 cases in India
13. Visualization using Folium map for statewise COVID-19 cases in India
14. Data Preprocessing for COVID-19 cases prediction
15. Predictions
    1. Prediction cases for COVID-19 worldwide
    2. Prediction cases for COVID-19 in India
    3. Prediction cases for COVID-19 in US
16. Save the predicted results into excel file

Introduction

Coronavirus is a family of viruses that can cause illness, which can vary from common cold and cough to sometimes more severe disease. Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV) were such severe cases with the world already has faced. SARS-CoV-2 (n-coronavirus) is the new virus of the coronavirus family, which first discovered in 2019, which has not been identified in humans before. It is a contiguous virus which started from Wuhan in December 2019. Which later declared as Pandemic by WHO due to high rate spreads throughout the world. Currently (on date 27 March 2020), this leads to a total of 24K+ Deaths across the globe, including 16K+ deaths alone in Europe. Pandemic is spreading all over the world; it becomes more important to understand about this spread. This NoteBook is an effort to analyze the cumulative data of confirmed, deaths, and recovered cases over time. In this notebook, the main focus is to analyze the spread trend of this virus all over the world.

Downloading and Installing Prerequisites

In [1]:
!pip install pycountry_convert 
!pip install folium
#!wget https://raw.githubusercontent.com/tarunk04/COVID-19-CaseStudy-and-Predictions/master/models/model_deaths.h5
#!wget https://raw.githubusercontent.com/tarunk04/COVID-19-CaseStudy-and-Predictions/master/models/model_confirmed.h5
#!wget https://raw.githubusercontent.com/tarunk04/COVID-19-CaseStudy-and-Predictions/master/models/model_usa_c.h5
Requirement already satisfied: pycountry_convert in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (0.7.2)
Requirement already satisfied: repoze.lru>=0.7 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pycountry_convert) (0.7)
Requirement already satisfied: pytest>=3.4.0 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pycountry_convert) (5.4.1)
Requirement already satisfied: pycountry>=16.11.27.1 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pycountry_convert) (19.8.18)
Requirement already satisfied: wheel>=0.30.0 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pycountry_convert) (0.33.6)
Requirement already satisfied: pprintpp>=0.3.0 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pycountry_convert) (0.4.0)
Requirement already satisfied: pytest-cov>=2.5.1 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pycountry_convert) (2.8.1)
Requirement already satisfied: pytest-mock>=1.6.3 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pycountry_convert) (3.0.0)
Requirement already satisfied: attrs>=17.4.0 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (19.3.0)
Requirement already satisfied: packaging in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (20.3)
Requirement already satisfied: more-itertools>=4.0.0 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (8.0.2)
Requirement already satisfied: pluggy<1.0,>=0.12 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (0.13.1)
Requirement already satisfied: atomicwrites>=1.0; sys_platform == "win32" in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.3.0)
Requirement already satisfied: wcwidth in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (0.1.7)
Requirement already satisfied: colorama; sys_platform == "win32" in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (0.4.3)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.3.0)
Requirement already satisfied: py>=1.5.0 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.8.1)
Requirement already satisfied: coverage>=4.4 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from pytest-cov>=2.5.1->pycountry_convert) (5.0.4)
Requirement already satisfied: pyparsing>=2.0.2 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from packaging->pytest>=3.4.0->pycountry_convert) (2.4.6)
Requirement already satisfied: six in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from packaging->pytest>=3.4.0->pycountry_convert) (1.13.0)
Requirement already satisfied: zipp>=0.5 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest>=3.4.0->pycountry_convert) (0.6.0)
Requirement already satisfied: folium in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (0.10.1)
Requirement already satisfied: branca>=0.3.0 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from folium) (0.4.0)
Requirement already satisfied: numpy in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from folium) (1.17.4)
Requirement already satisfied: requests in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from folium) (2.23.0)
Requirement already satisfied: jinja2>=2.9 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from folium) (2.10.3)
Requirement already satisfied: six in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from branca>=0.3.0->folium) (1.13.0)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from requests->folium) (2.9)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from requests->folium) (2019.11.28)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from requests->folium) (1.25.8)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from jinja2>=2.9->folium) (1.1.1)

Importing python libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import ticker
import pycountry_convert as pc
import folium
!pip install branca
import branca
Requirement already satisfied: branca in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (0.4.0)
Requirement already satisfied: jinja2 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from branca) (2.10.3)
Requirement already satisfied: six in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from branca) (1.13.0)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from jinja2->branca) (1.1.1)
In [3]:
from datetime import datetime, timedelta, date
from scipy.interpolate import make_interp_spline, BSpline
!pip install plotly
import plotly.express as px
import json, requests
Requirement already satisfied: plotly in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (4.6.0)
Requirement already satisfied: six in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from plotly) (1.13.0)
Requirement already satisfied: retrying>=1.3.3 in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from plotly) (1.3.3)
In [4]:
from keras.layers import Input, Dense, Activation, LeakyReLU
from keras import models
from keras.optimizers import RMSprop, Adam
Using TensorFlow backend.

Loading Datasets

In [5]:
df_confirmed = pd.read_csv('global_covid_confirmed_daily_updates.csv')
print('Global Covid-19 Confirmed:')
df_confirmed.head(5)
Global Covid-19 Confirmed:
Out[5]:
Country/Region 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
0 Afghanistan 0 0 0 0 0 0 0 0 0 ... 84 94 110 110 120 170 174 237 273 281
1 Albania 0 0 0 0 0 0 0 0 0 ... 146 174 186 197 212 223 243 259 277 304
2 Algeria 0 0 0 0 0 0 0 0 0 ... 302 367 409 454 511 584 716 847 986 1171
3 Andorra 0 0 0 0 0 0 0 0 0 ... 188 224 267 308 334 370 376 390 428 439
4 Angola 0 0 0 0 0 0 0 0 0 ... 3 4 4 5 7 7 7 8 8 8

5 rows × 74 columns

In [6]:
df_deaths = pd.read_csv('global_covid_deaths_daily_updates.csv')
print('Global Covid-19 Deaths:')
df_deaths.head(5)
Global Covid-19 Deaths:
Out[6]:
Country/Region 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
0 Afghanistan 0 0 0 0 0 0 0 0 0 ... 2 4 4 4 4 4 4 4 6 6
1 Albania 0 0 0 0 0 0 0 0 0 ... 5 6 8 10 10 11 15 15 16 17
2 Algeria 0 0 0 0 0 0 0 0 0 ... 21 25 26 29 31 35 44 58 86 105
3 Andorra 0 0 0 0 0 0 0 0 0 ... 1 3 3 3 6 8 12 14 15 16
4 Angola 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 2 2 2 2 2 2

5 rows × 74 columns

In [7]:
ts_confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
print('Time series of Covid-19 Confirmed: ')
ts_confirmed.head(5)
Time series of Covid-19 Confirmed: 
Out[7]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0 0 0 0 ... 84 94 110 110 120 170 174 237 273 281
1 NaN Albania 41.1533 20.1683 0 0 0 0 0 0 ... 146 174 186 197 212 223 243 259 277 304
2 NaN Algeria 28.0339 1.6596 0 0 0 0 0 0 ... 302 367 409 454 511 584 716 847 986 1171
3 NaN Andorra 42.5063 1.5218 0 0 0 0 0 0 ... 188 224 267 308 334 370 376 390 428 439
4 NaN Angola -11.2027 17.8739 0 0 0 0 0 0 ... 3 4 4 5 7 7 7 8 8 8

5 rows × 77 columns

In [8]:
ts_confirmed.shape
Out[8]:
(258, 77)
In [9]:
ts_deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
print('Time series of Covid-19 Deaths: ')
ts_deaths.head(5)
Time series of Covid-19 Deaths: 
Out[9]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0 0 0 0 ... 2 4 4 4 4 4 4 4 6 6
1 NaN Albania 41.1533 20.1683 0 0 0 0 0 0 ... 5 6 8 10 10 11 15 15 16 17
2 NaN Algeria 28.0339 1.6596 0 0 0 0 0 0 ... 21 25 26 29 31 35 44 58 86 105
3 NaN Andorra 42.5063 1.5218 0 0 0 0 0 0 ... 1 3 3 3 6 8 12 14 15 16
4 NaN Angola -11.2027 17.8739 0 0 0 0 0 0 ... 0 0 0 0 2 2 2 2 2 2

5 rows × 77 columns

In [10]:
ts_deaths.shape
Out[10]:
(258, 77)
In [11]:
df_covid19 = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv")
print('Covid 19 cases World Dataset:')
df_covid19.head()
Covid 19 cases World Dataset:
Out[11]:
Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active
0 Australia 2020-04-04 10:52:42 -25.0000 133.0000 5550 30 701 4819
1 Austria 2020-04-04 10:46:32 47.5162 14.5501 11781 186 2507 9088
2 Canada 2020-04-04 10:52:27 60.0010 -95.0010 12545 188 2321 0
3 China 2020-04-04 09:38:19 30.5928 114.3055 82543 3330 76942 2271
4 Denmark 2020-04-04 10:46:32 56.0000 10.0000 3948 139 1289 2520

Finding country wise information on COVID-19

In [12]:
df_countries_cases = df_covid19.copy().drop(['Lat','Long_','Last_Update'],axis =1)
In [13]:
df_countries_cases.head()
Out[13]:
Country_Region Confirmed Deaths Recovered Active
0 Australia 5550 30 701 4819
1 Austria 11781 186 2507 9088
2 Canada 12545 188 2321 0
3 China 82543 3330 76942 2271
4 Denmark 3948 139 1289 2520
In [14]:
df_countries_cases.index = df_countries_cases["Country_Region"]
In [15]:
df_countries_cases = df_countries_cases.drop("Country_Region", axis = 1)
df_countries_cases.head()
Out[15]:
Confirmed Deaths Recovered Active
Country_Region
Australia 5550 30 701 4819
Austria 11781 186 2507 9088
Canada 12545 188 2321 0
China 82543 3330 76942 2271
Denmark 3948 139 1289 2520

Global cases reported till Date

Total number of Confirmed, Deaths, Recovered and Active cases across the globe

In [16]:
globalcases_report = pd.DataFrame(df_countries_cases.sum()).transpose().style.background_gradient(cmap='Wistia',axis=1)
globalcases_report
Out[16]:
Confirmed Deaths Recovered Active
0 1131713 59884 233591 566800

Country wise reported COVID-19 cases

In [17]:
df_countries_cases.sort_values('Confirmed', ascending= False).style.background_gradient(cmap='Wistia')
Out[17]:
Confirmed Deaths Recovered Active
Country_Region
US 278458 7159 9897 0
Spain 124736 11744 34219 78773
Italy 119827 14681 19758 85388
Germany 91159 1275 24575 65309
France 83029 6520 14135 62374
China 82543 3330 76942 2271
Iran 55743 3452 19736 32555
United Kingdom 38697 3611 209 34877
Turkey 20921 425 484 20012
Switzerland 19702 604 4846 14252
Belgium 18431 1283 3247 13901
Netherlands 15821 1492 260 14069
Canada 12545 188 2321 0
Austria 11781 186 2507 9088
Korea, South 10156 177 6325 3654
Portugal 9886 246 68 9572
Brazil 9216 365 127 8724
Israel 7589 43 427 7119
Sweden 6131 358 205 5568
Australia 5550 30 701 4819
Norway 5370 59 32 5279
Russia 4731 43 333 4355
Ireland 4273 120 25 4128
Czechia 4194 56 74 4064
Denmark 3948 139 1289 2520
Chile 3737 22 427 3288
Romania 3613 141 329 3143
Poland 3503 73 116 3314
Malaysia 3483 57 915 2511
Ecuador 3368 145 65 3158
Philippines 3094 144 57 2893
India 3082 86 229 2767
Japan 2935 69 514 2352
Pakistan 2708 40 130 2538
Luxembourg 2612 31 500 2081
Indonesia 2092 191 150 1751
Thailand 2067 20 612 1435
Saudi Arabia 2039 25 351 1663
Finland 1882 20 300 1562
Mexico 1688 60 633 995
Panama 1673 41 10 1622
Greece 1613 67 78 1468
Peru 1595 61 537 997
South Africa 1505 9 95 1401
Dominican Republic 1488 68 16 1404
Serbia 1476 39 0 1437
Iceland 1364 4 336 1024
Argentina 1353 42 266 1045
Colombia 1267 25 55 1187
United Arab Emirates 1264 9 108 1147
Algeria 1171 105 62 1004
Singapore 1114 6 282 826
Ukraine 1096 28 23 1045
Croatia 1079 12 92 975
Qatar 1075 3 93 979
Estonia 1018 13 59 946
Egypt 985 66 216 703
Slovenia 977 22 70 885
New Zealand 950 1 127 822
Morocco 844 50 59 735
Iraq 820 54 226 540
Lithuania 771 9 7 755
Armenia 770 7 43 720
Diamond Princess 712 11 619 82
Bahrain 688 4 399 285
Hungary 678 32 58 588
Bosnia and Herzegovina 615 19 27 569
Moldova 591 8 26 557
Kazakhstan 525 5 36 484
Lebanon 520 17 50 453
Latvia 509 1 1 507
Cameroon 509 8 17 484
Bulgaria 498 15 34 449
Tunisia 495 18 5 472
Kuwait 479 1 93 385
Slovakia 471 1 10 460
Azerbaijan 443 5 32 406
Andorra 439 16 16 407
North Macedonia 430 12 20 398
Costa Rica 416 2 11 403
Cyprus 396 11 28 357
Uruguay 386 4 86 296
Belarus 351 4 53 294
Taiwan* 348 5 50 293
Albania 332 18 99 215
Jordan 310 5 58 247
Burkina Faso 302 16 50 236
Afghanistan 299 7 10 282
Oman 277 1 61 215
Cuba 269 6 15 248
Honduras 264 15 3 246
San Marino 251 32 26 193
Uzbekistan 241 2 25 214
Vietnam 239 0 90 149
Cote d'Ivoire 218 1 19 198
Malta 213 0 2 211
Nigeria 210 4 25 181
Senegal 207 1 66 140
Ghana 205 5 31 169
West Bank and Gaza 205 1 21 183
Montenegro 197 2 1 194
Mauritius 196 7 0 189
Sri Lanka 159 5 25 129
Georgia 156 1 28 127
Venezuela 153 7 52 94
Congo (Kinshasa) 148 16 3 129
Kyrgyzstan 144 1 9 134
Bolivia 139 10 1 128
Brunei 135 1 66 68
Kosovo 126 1 10 115
Kenya 122 4 4 114
Niger 120 5 0 115
Cambodia 114 0 50 64
Trinidad and Tobago 100 6 1 93
Paraguay 96 3 12 81
Rwanda 89 0 0 89
Liechtenstein 75 0 0 75
Guinea 73 0 2 71
Bangladesh 70 8 30 32
Madagascar 70 0 0 70
Monaco 64 1 3 60
El Salvador 56 3 0 53
Jamaica 53 3 7 43
Barbados 51 0 0 51
Guatemala 50 1 12 37
Djibouti 49 0 8 41
Uganda 48 0 0 48
Togo 40 3 17 20
Zambia 39 1 2 36
Mali 39 3 0 36
Ethiopia 38 0 3 35
Bahamas 24 3 0 21
Guyana 23 4 0 19
Eritrea 22 0 0 22
Congo (Brazzaville) 22 2 2 18
Gabon 21 1 1 19
Tanzania 20 1 3 16
Burma 20 1 0 19
Maldives 19 0 13 6
Haiti 18 0 1 17
Libya 17 1 0 16
Benin 16 0 2 14
Syria 16 2 2 12
Equatorial Guinea 16 0 1 15
Guinea-Bissau 15 0 0 15
Antigua and Barbuda 15 0 0 15
Mongolia 14 0 2 12
Dominica 14 0 0 14
Namibia 14 0 3 11
Saint Lucia 13 0 1 12
Fiji 12 0 0 12
Grenada 12 0 0 12
Laos 10 0 0 10
Seychelles 10 0 0 10
Mozambique 10 0 0 10
Suriname 10 1 0 9
Sudan 10 2 2 6
Nepal 9 0 1 8
Zimbabwe 9 1 0 8
MS Zaandam 9 2 0 7
Chad 9 0 0 9
Eswatini 9 0 0 9
Saint Kitts and Nevis 9 0 0 9
Central African Republic 8 0 0 8
Angola 8 2 1 5
Holy See 7 0 0 7
Liberia 7 0 0 7
Saint Vincent and the Grenadines 7 0 1 6
Somalia 7 0 1 6
Cabo Verde 6 1 0 5
Mauritania 6 1 2 3
Nicaragua 5 1 0 4
Bhutan 5 0 2 3
Botswana 4 1 0 3
Gambia 4 1 2 1
Belize 4 0 0 4
Malawi 3 0 0 3
Burundi 3 0 0 3
Sierra Leone 2 0 0 2
Timor-Leste 1 0 0 1
Papua New Guinea 1 0 0 1

Top 10 countries with COVID-19 cases

In [18]:
f = plt.figure(figsize=(10,5))
f.add_subplot(111)

plt.axes(axisbelow=True)
plt.barh(df_countries_cases.sort_values('Confirmed')["Confirmed"].index[-10:],df_countries_cases.sort_values('Confirmed')["Confirmed"].values[-10:],color="orange")
plt.tick_params(size=5,labelsize = 13)
plt.xlabel("Confirmed Cases",fontsize=18)
plt.title("Top 10 Countries (Confirmed Cases)",fontsize=20)
plt.grid(alpha=0.3)
In [19]:
f = plt.figure(figsize=(10,5))
f.add_subplot(111)

plt.axes(axisbelow=True)
plt.barh(df_countries_cases.sort_values('Deaths')["Deaths"].index[-10:],df_countries_cases.sort_values('Deaths')["Deaths"].values[-10:],color="brown")
plt.tick_params(size=5,labelsize = 13)
plt.xlabel("Death Cases",fontsize=18)
plt.title("Top 10 Countries (Death Cases)",fontsize=20)
plt.grid(alpha=0.3)
In [20]:
f = plt.figure(figsize=(10,5))
f.add_subplot(111)

plt.axes(axisbelow=True)
plt.barh(df_countries_cases.sort_values('Recovered')["Recovered"].index[-10:],df_countries_cases.sort_values('Recovered')["Recovered"].values[-10:],color="green")
plt.tick_params(size=5,labelsize = 13)
plt.xlabel("Recovered Cases",fontsize=18)
plt.title("Top 10 Countries (Recovered Cases)",fontsize=20)
plt.grid(alpha=0.3)
In [21]:
f = plt.figure(figsize=(10,5))
f.add_subplot(111)

plt.axes(axisbelow=True)
plt.barh(df_countries_cases.sort_values('Active')["Active"].index[-10:],df_countries_cases.sort_values('Active')["Active"].values[-10:],color="Darkcyan")
plt.tick_params(size=5,labelsize = 13)
plt.xlabel("Active Cases",fontsize=18)
plt.title("Top 10 Countries (Active Cases)",fontsize=20)
plt.grid(alpha=0.3)
#plt.savefig(out+'Top 10 Countries (Confirmed Cases).png')

Correlation Analysis of COVID-19 cases

In [22]:
df_countries_cases.corr().style.background_gradient(cmap='Reds')
Out[22]:
Confirmed Deaths Recovered Active
Confirmed 1 0.79451 0.574788 0.592723
Deaths 0.79451 1 0.570444 0.837853
Recovered 0.574788 0.570444 1 0.502612
Active 0.592723 0.837853 0.502612 1

Visualization of COVID-19 global cases using Folium map

In [23]:
world_map = folium.Map(location=[10,0], tiles="cartodbpositron", zoom_start=2,max_zoom=6,min_zoom=2)
for i in range(0,len(ts_confirmed)):
    folium.Circle(
        location=[ts_confirmed.iloc[i]['Lat'], ts_confirmed.iloc[i]['Long']],
        tooltip = "<h5 style='text-align:center;font-weight: bold'>"+ts_confirmed.iloc[i]['Country/Region']+"</h5>"+
                    "<div style='text-align:center;'>"+str(np.nan_to_num(ts_confirmed.iloc[i]['Province/State']))+"</div>"+
                    "<hr style='margin:10px;'>"+
                    "<ul style='color: #444;list-style-type:circle;align-item:left;padding-left:20px;padding-right:20px'>"+
        "<li>Confirmed: "+str(ts_confirmed.iloc[i,-1])+"</li>"+
        "<li>Deaths:   "+str(ts_deaths.iloc[i,-1])+"</li>"+
        "<li>Mortality Rate:   "+str(np.round(ts_deaths.iloc[i,-1]/(ts_confirmed.iloc[i,-1]+1.00001)*100,2))+"</li>"+
        "</ul>"
        ,
        radius=(int((np.log(ts_confirmed.iloc[i,-1]+1.00001)))+0.2)*50000,
        color='#ff6600',
        fill_color='#ff8533',
        fill=True).add_to(world_map)

world_map
Out[23]:
In [24]:
temp_df = pd.DataFrame(df_countries_cases['Confirmed'])
temp_df = temp_df.reset_index()
fig = px.choropleth(temp_df, locations="Country_Region",
                    color=np.log10(temp_df.iloc[:,-1]), # lifeExp is a column of gapminder
                    hover_name="Country_Region", # column to add to hover information
                    hover_data=["Confirmed"],
                    color_continuous_scale=px.colors.sequential.Plasma,locationmode="country names")
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(title_text="Confirmed Cases across globe")
fig.update_coloraxes(colorbar_title="Confirmed Cases across globe",colorscale="Reds")
# fig.to_image("Global Heat Map confirmed.png")
fig.show()
In [25]:
temp_df = pd.DataFrame(df_countries_cases['Deaths'])
temp_df = temp_df.reset_index()
fig = px.choropleth(temp_df, locations="Country_Region",
                    color=np.log10(temp_df.iloc[:,-1]), # lifeExp is a column of gapminder
                    hover_name="Country_Region", # column to add to hover information
                    hover_data=["Deaths"],
                    color_continuous_scale=px.colors.sequential.Plasma,locationmode="country names")
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(title_text="Death Cases across globe")
fig.update_coloraxes(colorbar_title="Death Cases across globe",colorscale="Reds")
# fig.to_image("Global Heat Map confirmed.png")
fig.show()
C:\Users\Surya-Rekha\Anaconda3\envs\myenv\lib\site-packages\pandas\core\series.py:856: RuntimeWarning:

divide by zero encountered in log10

Visualization of COVID-19 in India

In [26]:
india_data_json = requests.get('https://api.rootnet.in/covid19-in/unofficial/covid19india.org/statewise').json()
df_india_cases = pd.io.json.json_normalize(india_data_json['data']['statewise'])
df_india_cases = df_india_cases.set_index("state")
df_india_cases.head()
Out[26]:
confirmed recovered deaths active
state
Maharashtra 537 50 26 461
Tamil Nadu 411 6 1 404
Delhi 386 8 6 372
Kerala 295 42 2 251
Telangana 229 32 11 186
In [27]:
total = df_india_cases.sum()
total.name = "Total"
pd.DataFrame(total).transpose().style.background_gradient(cmap='Wistia',axis=1)
Out[27]:
confirmed recovered deaths active
Total 3229 229 86 2914
In [28]:
df_india_cases.style.background_gradient(cmap='Wistia')
Out[28]:
confirmed recovered deaths active
state
Maharashtra 537 50 26 461
Tamil Nadu 411 6 1 404
Delhi 386 8 6 372
Kerala 295 42 2 251
Telangana 229 32 11 186
Rajasthan 198 3 0 195
Uttar Pradesh 174 17 2 155
Andhra Pradesh 180 2 1 177
Madhya Pradesh 154 0 8 146
Karnataka 128 11 4 113
Gujarat 105 10 9 86
Jammu and Kashmir 78 3 2 73
Haryana 76 27 0 49
West Bengal 53 3 6 44
Punjab 57 1 5 51
Bihar 31 3 1 27
Assam 25 0 0 25
Odisha 20 2 0 18
Chandigarh 18 0 0 18
Uttarakhand 16 2 0 14
Ladakh 14 3 0 11
Andaman and Nicobar Islands 10 0 0 10
Chhattisgarh 10 3 0 7
Himachal Pradesh 6 1 2 3
Goa 7 0 0 7
Puducherry 5 0 0 5
Jharkhand 2 0 0 2
Manipur 2 0 0 2
Mizoram 1 0 0 1
Arunachal Pradesh 1 0 0 1
Dadra and Nagar Haveli 0 0 0 0
Daman and Diu 0 0 0 0
Lakshadweep 0 0 0 0
Meghalaya 0 0 0 0
Nagaland 0 0 0 0
Sikkim 0 0 0 0
Tripura 0 0 0 0
In [29]:
f = plt.figure(figsize=(10,5))
f.add_subplot(111)

plt.axes(axisbelow=True)
plt.barh(df_india_cases.sort_values('confirmed')["confirmed"].index[-10:],df_india_cases.sort_values('confirmed')["confirmed"].values[-10:],color="orange")
plt.tick_params(size=5,labelsize = 13)
plt.xlabel("Confirmed Cases",fontsize=18)
plt.title("Top 10 States in India (Confirmed Cases)",fontsize=20)
plt.grid(alpha=0.3)
#plt.savefig(out+'Top 10 States_India (Confirmed Cases).png')
In [30]:
f = plt.figure(figsize=(10,5))
f.add_subplot(111)

plt.axes(axisbelow=True)
plt.barh(df_india_cases.sort_values('deaths')["deaths"].index[-10:],df_india_cases.sort_values('deaths')["deaths"].values[-10:],color="brown")
plt.tick_params(size=5,labelsize = 13)
plt.xlabel("Death Cases",fontsize=18)
plt.title("Top 10 States in India (Deaths Cases)",fontsize=20)
plt.grid(alpha=0.3)
#plt.savefig(out+'Top 10 States_India (Confirmed Cases).png')

Correlation Analysis of COVID-19 cases in India

In [31]:
df_india_cases.corr().style.background_gradient(cmap='Reds')
Out[31]:
confirmed recovered deaths active
confirmed 1 0.710931 0.69075 0.996345
recovered 0.710931 1 0.680555 0.651387
deaths 0.69075 0.680555 1 0.648155
active 0.996345 0.651387 0.648155 1

Visualization using Folium map for statewise COVID-19 cases in India

In [32]:
# Adding states geolocation(Latitude,Longitude) data to India dataset
geolocations = {
    "Kerala" : [10.8505,76.2711],
    "Maharashtra" : [19.7515,75.7139],
    "Karnataka": [15.3173,75.7139],
    "Telangana": [18.1124,79.0193],
    "Uttar Pradesh": [26.8467,80.9462],
    "Rajasthan": [27.0238,74.2179],
    "Gujarat":[22.2587,71.1924],
    "Delhi" : [28.7041,77.1025],
    "Punjab":[31.1471,75.3412],
    "Tamil Nadu": [11.1271,78.6569],
    "Haryana": [29.0588,76.0856],
    "Madhya Pradesh":[22.9734,78.6569],
    "Jammu and Kashmir":[33.7782,76.5762],
    "Ladakh": [34.1526,77.5770],
    "Andhra Pradesh":[15.9129,79.7400],
    "West Bengal": [22.9868,87.8550],
    "Bihar": [25.0961,85.3131],
    "Chhattisgarh":[21.2787,81.8661],
    "Chandigarh":[30.7333,76.7794],
    "Uttarakhand":[30.0668,79.0193],
    "Himachal Pradesh":[31.1048,77.1734],
    "Goa": [15.2993,74.1240],
    "Odisha":[20.9517,85.0985],
    "Andaman and Nicobar Islands": [11.7401,92.6586],
    "Puducherry":[11.9416,79.8083],
    "Manipur":[24.6637,93.9063],
    "Mizoram":[23.1645,92.9376],
    "Assam":[26.2006,92.9376],
    "Meghalaya":[25.4670,91.3662],
    "Tripura":[23.9408,91.9882],
    "Arunachal Pradesh":[28.2180,94.7278],
    "Jharkhand" : [23.6102,85.2799],
    "Nagaland": [26.1584,94.5624],
    "Sikkim": [27.5330,88.5122],
    "Dadra and Nagar Haveli":[20.1809,73.0169],
    "Lakshadweep":[10.5667,72.6417],
    "Daman and Diu":[20.4283,72.8397]    
}
df_india_cases["Lat"] = ""
df_india_cases["Long"] = ""
for index in df_india_cases.index :
    df_india_cases.loc[df_india_cases.index == index,"Lat"] = geolocations[index][0]
    df_india_cases.loc[df_india_cases.index == index,"Long"] = geolocations[index][1]
In [33]:
# url = "https://raw.githubusercontent.com/Subhash9325/GeoJson-Data-of-Indian-States/master/Indian_States"
# state_json = requests.get(url).json()
india = folium.Map(location=[23,80], zoom_start=4,max_zoom=10, tiles='OpenStreetMap', attr= "INDIA" ,min_zoom=4,height=500,width="80%")
for i in range(0,len(df_india_cases[df_india_cases['confirmed']>0].index)):
    folium.Circle(
        location=[df_india_cases.iloc[i]['Lat'], df_india_cases.iloc[i]['Long']],
        tooltip = "<h5 style='text-align:center;font-weight: bold'>"+df_india_cases.iloc[i].name+"</h5>"+
                    "<hr style='margin:10px;'>"+
                    "<ul style='color: #444;list-style-type:circle;align-item:left;padding-left:20px;padding-right:20px'>"+
        "<li>Confirmed: "+str(df_india_cases.iloc[i]['confirmed'])+"</li>"+
        "<li>Active:   "+str(df_india_cases.iloc[i]['active'])+"</li>"+
        "<li>Recovered:   "+str(df_india_cases.iloc[i]['recovered'])+"</li>"+
        "<li>Deaths:   "+str(df_india_cases.iloc[i]['deaths'])+"</li>"+
        
        "<li>Mortality Rate:   "+str(np.round(df_india_cases.iloc[i]['deaths']/(df_india_cases.iloc[i]['confirmed']+1)*100,2))+"</li>"+
        "</ul>"
        ,
        radius=(int(np.log2(df_india_cases.iloc[i]['confirmed']+1)))*15000,
        color='#ff6600',
        fill_color='#ff8533',
        fill=True).add_to(india)

india
Out[33]:

Data Preprocessing for COVID-19 cases prediction

In [34]:
grp_confirm_data_df = ts_confirmed.groupby('Country/Region', as_index=False).sum()
grp_confirm_data_df.reset_index(inplace=True, drop=True)
grp_confirm_data_df.shape
Out[34]:
(181, 76)
In [35]:
grp_death_data_df = ts_deaths.groupby('Country/Region', as_index=False).sum()
grp_death_data_df.reset_index(inplace=True, drop=True)
grp_death_data_df.shape
Out[35]:
(181, 76)
In [36]:
grp_confirm_data_df.head()
Out[36]:
Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
0 Afghanistan 33.0000 65.0000 0 0 0 0 0 0 0 ... 84 94 110 110 120 170 174 237 273 281
1 Albania 41.1533 20.1683 0 0 0 0 0 0 0 ... 146 174 186 197 212 223 243 259 277 304
2 Algeria 28.0339 1.6596 0 0 0 0 0 0 0 ... 302 367 409 454 511 584 716 847 986 1171
3 Andorra 42.5063 1.5218 0 0 0 0 0 0 0 ... 188 224 267 308 334 370 376 390 428 439
4 Angola -11.2027 17.8739 0 0 0 0 0 0 0 ... 3 4 4 5 7 7 7 8 8 8

5 rows × 76 columns

In [37]:
grp_death_data_df.head()
Out[37]:
Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
0 Afghanistan 33.0000 65.0000 0 0 0 0 0 0 0 ... 2 4 4 4 4 4 4 4 6 6
1 Albania 41.1533 20.1683 0 0 0 0 0 0 0 ... 5 6 8 10 10 11 15 15 16 17
2 Algeria 28.0339 1.6596 0 0 0 0 0 0 0 ... 21 25 26 29 31 35 44 58 86 105
3 Andorra 42.5063 1.5218 0 0 0 0 0 0 0 ... 1 3 3 3 6 8 12 14 15 16
4 Angola -11.2027 17.8739 0 0 0 0 0 0 0 ... 0 0 0 0 2 2 2 2 2 2

5 rows × 76 columns

In [38]:
country_df = grp_confirm_data_df["Country/Region"]
country_df.shape
Out[38]:
(181,)

Predictions

1. Prediction cases for COVID-19 worldwide

Use Moving Average and ExponentiallyWeightedMovingAverage models¶

In [39]:
def cal_moving_avg(input_df, window=2):
    """Calculates Moving avg using the window size."""
    _df = input_df.rolling(window, axis=1).median()['4/1/20']
    
    return _df

def cal_ewma(input_df, comm=0.3):
    """Calculates the exp weighted moving average using the window size."""
    _df = input_df.ewm(com=comm).mean()['4/1/20']
    
    return _df
In [40]:
all_pred_df = pd.DataFrame()
all_pred_df_ew = pd.DataFrame()

for df in [grp_confirm_data_df, grp_death_data_df]:
    _df = cal_moving_avg(df, window=2)
    ew_df = cal_ewma(df, comm=0.4)
    all_pred_df = pd.concat([all_pred_df, _df], axis=1)
    all_pred_df_ew = pd.concat([all_pred_df_ew, ew_df], axis=1)
    
all_pred_df.columns =['Confirmed', 'Deaths']    
all_pred_df_ew.columns =['Confirmed', 'Deaths']   
In [41]:
print('\n')
print('COVID-19 cases worldwide till 1/4/2020 using MA:')
all_pred_df

COVID-19 cases worldwide till 1/4/2020 using MA:
Out[41]:
Confirmed Deaths
0 205.5 4.0
1 251.0 15.0
2 781.5 51.0
3 383.0 13.0
4 7.5 2.0
... ... ...
176 139.0 3.0
177 215.0 0.0
178 126.5 1.0
179 35.5 0.0
180 8.0 1.0

181 rows × 2 columns

In [42]:
print('\n')
print('COVID-19 cases worldwide till 1/4/2020 using EWMA:')
all_pred_df_ew

COVID-19 cases worldwide till 1/4/2020 using EWMA:
Out[42]:
Confirmed Deaths
0 237.000000 4.000000
1 254.111111 12.555556
2 687.716418 45.791045
3 473.635220 22.930818
4 140.404173 7.951714
... ... ...
176 685.494070 42.536915
177 351.569734 12.153404
178 196.162781 4.186687
179 81.760795 1.196196
180 29.074513 1.056056

181 rows × 2 columns

In [43]:
pred_1_df = pd.merge(country_df, all_pred_df, left_index= True, right_index = True)
pred_1_df.head()
Out[43]:
Country/Region Confirmed Deaths
0 Afghanistan 205.5 4.0
1 Albania 251.0 15.0
2 Algeria 781.5 51.0
3 Andorra 383.0 13.0
4 Angola 7.5 2.0
In [44]:
pred_2_df = pd.merge(country_df, all_pred_df_ew, left_index= True, right_index = True)
pred_2_df.head()
Out[44]:
Country/Region Confirmed Deaths
0 Afghanistan 237.000000 4.000000
1 Albania 254.111111 12.555556
2 Algeria 687.716418 45.791045
3 Andorra 473.635220 22.930818
4 Angola 140.404173 7.951714

2. Prediction cases for COVID-19 in India

In [45]:
ind_conf = grp_confirm_data_df[grp_confirm_data_df['Country/Region'] == 'India']
ind_conf
Out[45]:
Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
78 India 21.0 78.0 0 0 0 0 0 0 0 ... 657 727 887 987 1024 1251 1397 1998 2543 2567

1 rows × 76 columns

In [46]:
ind_deaths = grp_death_data_df[grp_death_data_df['Country/Region'] == 'India']
ind_deaths
Out[46]:
Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
78 India 21.0 78.0 0 0 0 0 0 0 0 ... 12 20 20 24 27 32 35 58 72 72

1 rows × 76 columns

Use Moving Average and EWMA models¶

In [47]:
def cal_moving_avg(input_df, window=2):
    """Calculates Moving avg using the window size."""
    _df = input_df.rolling(window, axis=1).median()['4/1/20']
    
    return _df

def cal_ewma(input_df, comm=0.3):
    """Calculates the exp weighted moving average using the window size."""
    _df = input_df.ewm(com=comm).mean()['4/1/20']
    
    return _df
In [48]:
ind_all_pred_df = pd.DataFrame()
ind_all_pred_df_ew = pd.DataFrame()

for df in [ind_conf, ind_deaths]:
    _df = cal_moving_avg(df, window=2)
    ew_df = cal_ewma(df, comm=0.4)
    ind_all_pred_df = pd.concat([ind_all_pred_df, _df], axis=1)
    ind_all_pred_df_ew = pd.concat([ind_all_pred_df_ew, ew_df], axis=1)
    
ind_all_pred_df.columns =['Confirmed', 'Deaths']    
ind_all_pred_df_ew.columns =['Confirmed', 'Deaths'] 
In [49]:
print('\n')
print('COVID-19 cases in India till 1/4/2020 using MA:')
ind_all_pred_df

COVID-19 cases in India till 1/4/2020 using MA:
Out[49]:
Confirmed Deaths
78 1697.5 46.5
In [50]:
print('\n')
print('COVID-19 cases in US till 1/4/2020 using EWMA:')
ind_all_pred_df_ew

COVID-19 cases in US till 1/4/2020 using EWMA:
Out[50]:
Confirmed Deaths
78 1998.0 58.0

3. Prediction cases for COVID-19 in US

In [51]:
us_conf = grp_confirm_data_df[grp_confirm_data_df['Country/Region'] == 'US']
us_conf
Out[51]:
Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
169 US 37.0902 -95.7129 1 1 2 2 5 5 5 ... 65778 83836 101657 121478 140886 161807 188172 213372 243453 275586

1 rows × 76 columns

In [52]:
us_deaths = grp_death_data_df[grp_death_data_df['Country/Region'] == 'US']
us_deaths
Out[52]:
Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 ... 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20
169 US 37.0902 -95.7129 0 0 0 0 0 0 0 ... 942 1209 1581 2026 2467 2978 3873 4757 5926 7087

1 rows × 76 columns

Use Moving Average and EWMA models

In [53]:
def cal_moving_avg(input_df, window=2):
    """Calculates Moving avg using the window size."""
    _df = input_df.rolling(window, axis=1).median()['4/1/20']
    
    return _df

def cal_ewma(input_df, comm=0.3):
    """Calculates the exp wighted moving average using the window size."""
    _df = input_df.ewm(com=comm).mean()['4/1/20']
    
    return _df
In [54]:
us_all_pred_df = pd.DataFrame()
us_all_pred_df_ew = pd.DataFrame()

for df in [us_conf, us_deaths]:
    _df = cal_moving_avg(df, window=2)
    ew_df = cal_ewma(df, comm=0.4)
    us_all_pred_df = pd.concat([us_all_pred_df, _df], axis=1)
    us_all_pred_df_ew = pd.concat([us_all_pred_df_ew, ew_df], axis=1)
    
us_all_pred_df.columns =['Confirmed', 'Deaths']    
us_all_pred_df_ew.columns =['Confirmed', 'Deaths']    
In [55]:
print('\n')
print('COVID-19 cases in US till 1/4/2020 using MA:')
us_all_pred_df

COVID-19 cases in US till 1/4/2020 using MA:
Out[55]:
Confirmed Deaths
169 200772.0 4315.0
In [56]:
print('\n')
print('COVID-19 cases in US till 1/4/2020 using EWMA:')
us_all_pred_df_ew

COVID-19 cases in US till 1/4/2020 using EWMA:
Out[56]:
Confirmed Deaths
169 213372.0 4757.0

Save the predicted results into excel file

In [57]:
!pip install openpyxl
pred_1_df.to_excel("Submission_COVID19_Globalcases_1.xlsx", index = False)
pred_2_df.to_excel("Submission_COVID19_Globalcases_2.xlsx", index = False)
Requirement already satisfied: openpyxl in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (3.0.3)
Requirement already satisfied: jdcal in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from openpyxl) (1.4.1)
Requirement already satisfied: et-xmlfile in c:\users\surya-rekha\anaconda3\envs\myenv\lib\site-packages (from openpyxl) (1.0.1)
In [ ]: